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1.  INTRODUCTION 


This  report  provides  a  summary  of  the  work  performed  at  UCLA  under  AFOSR  Grant 
89-0281,  “An  Additive  Turbulent  Decomposition  of  the  Navier-Stokes  Equations 
Implemented  on  Highly  Parallel  Computer  Systems.”  A  quite  detailed  report  of  the  first  six 
months  of  effort  was  submitted  in  the  form  of  an  Annual  Report  at  the  end  of  October,  1989. 
The  research  reported  therein  will  be  only  briefly  summarized  here,  and  the  main  items 
treated  will  represent  work  completed  during  the  period  1  Jan  1990  to  31  Mar  1990.  In  the 
main  body  of  this  report  the  discussions  will  be  mostly  qualitative,  but  an  appendix  is 
provided  to  report  the  details  of  pertinent  analyses. 

The  ideas  embodied  in  the  additive  turbulent  decomposition  (ATD)  were  first  proposed 
by  McDonough,  et  al.  [1,2]  and  developed  further  by  McDonough  and  By  water  [3-5].  The 
goal  was  to  provide  a  turbulence  simulation  technique  similar  to  large  eddy  simulation 
(LES),  but  without  the  well  known  deficiencies.  In  particular,  ATD  employs  no  formal 
filtering  of  the  large-scale  equations  or  modeling  on  the  subgrid  (or  small)  scale.  As  a 
consequence,  the  method  can  be  shown  to  be  formally  consistent  with  the  full  Navier-Stokes 
(N.-S.)  equations,  and  thus  completely  predictive.  In  this  respect,  ATD  is  more  closely 
related  to  direct  numerical  simulation  (DNS)  than  to  LES;  but  the  decomposition,  itself,  is 
reminiscent  of  LES. 

In  Refs.  1  through  5  ATD  was  developed  in  a  rather  heuristic  manner  in  the  context  of 
Burgers’  equation  model  problems.  It  has  been  the  goal  of  the  present  research  to  put  ATD 
on  a  stronger,  more  formal,  mathematical  basis,  and  at  the  same  time  extend  its  application  to 
the  full  N.-S.  equations.  In  particular,  the  two  main  tasks  of  the  proposed  research  were:  1) 
theoretical/computational  analysis  of  ATD  for  Burgers’  equation,  and  2)  sequential 
mainframe  implementation  of  ATD  for  the  3-D  N.-S.  equations.  Both  of  these  tasks  include 
numerous  subtasks,  as  listed  in  the  original  proposal,  McDonough,  et  al.  [6],  and  in  the 
earlier  progress  report,  McDonough,  et  al.  [7]. 

In  this  final  report  we  summarize  the  work  completed  on  these  tasks.  In  Sec.  2  we 
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describe  the  studies  conducted  on  Burgers’  equation;  Sec.  3  contains  an  analysis  of  the  work 
with  the  N.-S.  equations,  and  in  Sec.  4  we  provide  a  summary  of  the  work  on  domain 
decomposition,  which  was  a  major  subtask  of  the  proposed  Burgers’  equation  research. 
Finally,  in  Appendix  A  we  report  the  details  of  the  domain  decomposition  work. 


2.  ANALYSIS  OF  BURGERS’  EQUATION 

The  equation  that  has  been  studied  in  the  present  work  is  of  the  form 

u  1 1/2(0^ =  (1) 

This  equation  is  analogous  in  form  to  the  incompressible  N.-S.  equations,  except  that  is  a 
prescribed  forcing  term,  rather  than  an  unknown  to  be  determined  so  as  to  enforce  mass 
conservation.  During  the  study  reported  here,  four  main  projects  were  completed  using  this 
equation.  These  are:  1)  formal  proof  of  consistency  of  the  additive  decomposition,  and 
construction  of  a  predictor-corrector  algorithm  that  maintains  full  order  accuracy,  2) 
development  of  an  energy  conserving  scheme  for  transferring  large-scale  information  to  local 
small-scale  equations,  3)  derivation  of  an  iterative  scheme  for  guaranteeing  continuous 
recoupling  of  local  small-scale  results  to  obtain  a  global  small-scale  solution,  and  4) 
establishing  a  unique  value  for  the  decomposition  parameter,  denoted  as  3  in  [1-5].  We  note 
that  this  last  result  was  actually  Fust  obtained  for  the  full  N.-S.  equations,  and  so  will  be 
discussed  more  in  Sec.  3.  These  four  results  enable  one  to  prove  formal  consistency  of  the 
ATD  algorithm  with  a  direct  solution  of  (1),  thus  providing  the  desired  mathematical 
foundation.  An  additional  result,  not  directly  related  to  ATD,  per  se,  but  still  of  significance, 
was  also  obtained  during  studies  of  the  previous  year.  An  analysis  was  undertaken  to  attempt 
to  assess  the  effects  of  various  averaging  procedures  leading  to  the  Reynolds  averaged 
equations,  but  again  in  the  simple  context  of  Burgers’  equation  (1).  The  results  obtained 
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from  this  show  that  time  averaging  is  fundamentally  flawed,  while  errors  can  also  be 
introduced  by  ensemble  averaging  if  this  is  done  in  an  improper  manner.  We  will  provide 
further  brief  discussion  of  each  of  these  topics  in  the  subsections  that  follow. 

2. 1  Consistency  and  Accuracy  of  ATD 

To  state  the  main  result  of  this  analysis  we  need  the  following  notations.  We  assume 
the  solution  U(x,t)  if  (1)  can  be  decomposed  as 

U(x,t)  =  u(x,t)  +  u*(x,t) ,  (2) 

and  similarly,  the  forcing  function  is  represented  as 

P(x,t)  =  px(x,t)  +  p*(x,t) ,  (3) 

where  u ,  px  can  be  viewed  as  consisting  of  the  first  few  terms  in  a  Fourier  expansion  (and 
hence  correspond  to  low  frequency,  large-scale  behavior),  and  the  “♦’’-quantities  represent 
the  high  frequency,  small-scale  remainders  of  the  series. 

We  now  substitute  (2)  and  (3)  into  (1),  and  additively  decompose  the  result  to  obtain 

u,  +  l/2(u2)x  +  ( l-p)(u*u),  -  u„  =  -px ,  (large-scale)  (4a) 

u?  +  l/2(u*2)x  +  j3(uu*)x  -  u*  =  -p*  .  (small-scale)  (4b) 

R© 

We  note  that  the  value  of  the  decomposition  parameter  (3  is  arbitrary  with  respect  to  the 
mathematical  validity  of  the  decomposition  itself.  But  it  can  be  shown  via  physical 
arguments  for  the  N.-S.  equations  that  p  =  1/2  should  be  required.  This  can  also  be  argued 
on  the  basis  of  applying  renormalization  group  theory  to  transfer  information  from  the  small 
scale  to  the  large  scale.  In  particular,  when  P  =  1/2  Eqs.  (4a)  and  (4b)  are  identical  in  form. 
It  should  also  be  observed  that  there  is  no  “closure”  problem  associated  with  Eqs.  (4);  there  is 
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one  equation  for  each  unknown. 

We  can  now  state  the  main  consistency  and  accuracy  result.  This  will  be  done  in  the 
context  of  the  following  algorithm. 

Algorithm 

Suppose  n  time  steps  have  been  computed  for  u  and  u*  in  Eqs.  (4). 

Then  the  n+1 A  step  is  obtained  from  the  following  procedure: 

1)  calculate  a  p-l*  order  prediction,  u ,  ofu*‘  using  (4a); 

2)  use  u  in  (4b)  to  solve  for  it'**1  via  a  p*  order  method; 

3)  correct  u  to  obtain  u*‘  using  u  ***'  in  a  p*  order  method  for  Eq.  (4a). 

The  following  theorem  is  proven  in  [7]. 

Theorem.  Let  U(x,t)  be  a  globally  p*  order  approximation  to  the  solution  of  Eq.  (1),  and 
suppose  the  above  algorithm  is  used  to  calculate  u(x,t)  and  u*(x,t)  from  Eqs.  (4).  Then  u  + 
u*  =  U  +  O(kf) ,  where  k  is  the  time  step  used  to  integrate  (4a). 

2.2  Transfer  of  Information  between  Large  and  Small  Scales 

It  is  important  to  note  that  Eq.  (4b)  is  actually  solved  on  numerous  local  subdomains 
via  a  Galerkin  procedure  for  each  separate  one.  This  leads  directly  to  nearly  an  order  of 
magnitude  reduction  in  total  arithmetic,  and  at  the  same  time  to  the  potential  for  massive 
parallelization.  However,  the  local  spectral  representations  must  incorporate  large-scale, 
global  spectral  information,  and  this  must  be  accomplished  so  as  to  conserve  energy. 

We  let  <J>N(x)  denote  the  highest  basis  function  of  the  representation  supported  on  the 

large  scale,  and  let  Clt  denote  the  i*  small-scale  subinterval  such  that  £2.=  [0,1].  It  is 

proven  in  [7]  that  the  correct  lowest  mode  coefficient  on  the  small  scale  is  given  by 
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(5) 


a"=J  “♦k<1x- 

•'Q. 

i 

This  is  easily  approximated  via  numerical  quadrature. 


2.3  Global  Continuity  of  Small-Scale  Solutions 

There  are  several  ways  by  which  the  local  small-scale  solutions  can  be  recoupled  to 
achieve  a  required  degree  of  global  continuity.  The  approach  outlined  here  has  been  used  by 
other  researchers  in  different  applications.  In  [7]  we  provided  a  fairly  detailed  derivation  and 
stated  that  the  result  would  be  a  global  u*  e  C°(0,1) .  (In  fact,  we  can  show  by  a  different 
analysis  that  u*  e  C'(0,1)  holds.)  The  basic  idea  is  to  expand  the  local  representations  of  u* 
by  additional  basis  functions,  and  then  determine  the  Fourier  coefficients  of  these  so  as  to 
obtain  matching  of  solutions  across  subdomain  boundaries.  On  the  subdomain  we  have 
the  two  additional  coefficients 

aL<t)  =  Ai<l>  md  aK+2W  =  Bi«  ~  A,(l)  >  (6) 

where  the  A/s  and  B/s  are  unknown.  Now  At  will  be  chosen  to  be  consistent  with  Dirichlet 
conditions  at  x  =  0,  and  similarly  for  BN  at  x  =  1.  The  general  matching  condition  to 
guarantee  global  continuity  of  u*  is 


Ai+1(t)  =  B.(t) .  (7) 

Typical  implementations  enforce  this  iteratively. 

It  is  worth  mentioning  here  that  recent  work  on  this  problem  indicates  that  a  better 
treatment  may  be  to  transform  all  subdomain  boundary  information  to  the  local  differential 
operators  before  spectral  discretization.  Then  the  subdomain  matching  information  can  be 
obtained  directly  as  the  solution  to  a  tridiagonal  linear  system  for  either  the  A.s  or  B/s . 
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Moreover,  it  is  much  easier  to  maintain  orthonormality  of  subdomain  basis  functions  with 
this  approach. 


2.4  Analysis  of  Averaging  Techniques  Via  Burgers’  Equation 

This  particular  topic  is  not  strictly  a  part  of  the  research  involving  the  ATT)  algorithm. 
It  was,  however,  motivated  by  two  aspects  of  that  work.  First,  because  there  are  numerical 
reasons  for  employing  filters  in  LES,  it  was  desirable  to  study  averaging,  in  general,  to 
understand  its  consequences.  Second,  although  it  is  expected  that  ATD  will  significantly 
expand  the  Re-range  in  which  accurate  turbulence  simulations  can  be  done,  at  present  it  is 
clear  that  most  problems  still  will  not  admit  anything  near  to  DNS.  In  such  cases,  ATD  may 
be  useful  as  a  research  tool  in  the  same  sense  as  is  LES  and  DNS.  Hence,  it  is  important  to 
understand  just  how  ATD  results  should  be  processed. 

The  basic  idea  was  to  use  a  direct  simulation  of  Burgers’  equation  (1)  to  provide 
essentially  exact  Reynolds  stresses  for  use  in  Reynolds  averaged  versions  of  Burgers’ 
equation  via  the  definition 

u’(x,t)  =  U(x,t)  -  U ,  (8) 

where  “  —  ”  denotes  the  same  averaging  used  to  derive  the  Reynolds  averaged  equations,  and 
“  '  ”  denotes  a  fluctuating  quantity.  Our  findings  in  this  study  were  reported  in  [7],  and  in 
more  detail  by  Peng  [8]. 

We  will  simply  list  the  principal  results  here.  First,  if  time  averaging  is  employed,  we 
let  u(x)  denote  the  solution  to  the  Reynolds  averaged  Burgers’  equation  with  Reynolds  stress 
constructed  from  (8);  then  numerical  experiments  show  that  U(x,t)  *  u(x) .  Second,  if 
ensemble  averaging  is  used  equality  does  hold.  This  can  be  shown  both  analytically,  and 
through  numerical  experiments.  Third,  again  using  ensemble  averaging,  but  in  place  of  (8) 
we  use  the  very  natural  although  inconsistent  construction, 
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u'(x»t)  =  U(x,t)  -  u(x) , 


(9) 


where  u(x)  is  a  known  steady  flow  that  is  perturbed,  say  by  a  grid  or  a  vibrating  ribbon,  to 
produce  U(x,t),  then  (U(x,t))  *  (u(x,t))  where  <  • )  denotes  ensemble  averaging. 

These  negative,  and  dismaying,  results  are  important  for  two  reasons.  First,  with 
respect  to  implementation  of  ATD,  and  use  of  results  thereof,  they  show  that  the  equations, 
themselves,  must  not  be  averaged.  If  it  proves  necssary  to  provide  a  large-scale  numerical 
cut-off  in  some  problems,  the  above  results  show  that  the  correct  way  to  do  this  is  to  compute 
the  large-scale  solution,  and  then  average  (or  filter)  the  result.  Second,  we  feel  that  the  basic 
failure  of  the  averaging  process,  itself,  separate  from  any  Reynolds  stress  modeling,  provides 
a  clue  as  to  why  classical  turbulence  closures  seem  never  to  be  robust.  In  particular,  all  effort 
has  been  concentrated  in  the  Reynolds  stress  modeling;  but  the  Reynolds  averaged  equations, 
themselves,  are  evidently  incorrect  if  time  averaging  has  been  employed  (which  is  nearly 
always  the  case),  and  explicit  account  of  this  has  never  been  taken. 


3.  ATD  APPLIED  TO  THE  N.-S.  EQUATIONS 

In  this  section  we  will  discuss  the  work  accomplished  up  to  this  time  on  the  full  3-D 
N.-S.  equations. 


VU  =  0,  (10a) 

U +U*VU-~AU  =  -VP.  (10b) 

K.C 

In  these  equations  U  =  (U,V,W)T  is  the  velocity  vector,  and  P  is  the  pressure.  The  main 
results  that  have  been  obtained  are  the  following;  1)  extension  of  the  consistency  and 
accuracy  theorem  proven  for  Burgers’  equation,  2)  identification  of  a  unique  decomposition 
parameter  (tensor,  in  the  N.-S.  case)  via  momentum  transport  considerations,  3)  literature 
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review  for  the  problem  of  transition  in  pipe  flow,  4)  formal  application  of  ATD  to  the 
cylindrical  coordinate  N.-S.  equations,  and  5)  analysis  of  the  corresponding  small-scale 
equations.  The  details  of  these  studies  are  quite  lengthy  and  have  been  reported  in  [7];  hence, 
they  will  not  be  repeated  here.  But  items  1)  and  2)  are  particularly  significant  from  a 
theoretical  standpoint,  so  a  brief  discussion  will  be  provided. 

As  would  be  expected  from  the  form  of  Burgers’  equation,  the  consistency  and 
accuracy  results  discussed  in  Sec.  2  can  be  proven  for  the  N.-S.  equations  (10).  In  fact  the 
same  line  of  reasoning  given  in  [7]  carries  through.  This  is  a  crucial  theoretical  result 
because  it  shows  that  very  efficient  algorithms  can  be  used  to  implement  the  ATD  procedure. 
There  are,  of  course,  other  alternatives  as  will  be  discussed  below  in  Sec.  4,  and  in  Appendix 
A. 

The  identification  of  a  unique  decomposition  parameter  is  also  of  great  significance. 
In  particular,  the  apparent  nonuniqueness  of  the  decomposition  procedure  was  one  of  the 
shortcomings  discussed  in  [1-5],  and  there  was  early  concern  that  this  might  lead  to  a  closure 
problem.  But  the  analysis  presented  in  [7]  shows  that  the  N.-S.  decomposition  tensor  p 
should  simply  be  the  identity  tensor  I.  This  results  from  very  simple  physical  arguments 
involving  momentum  transport,  and  it  leads  to  a  decomposition  for  which  the  large-  and 
small-scale  equations  are  identical  in  form.  As  mentioned  earlier,  this  permits  application  of 
renormalization  group  techniques  to  transfer  results  from  small  to  large  scale,  and  it  also 
gives  a  unique  value  of  p  =  1/2  for  the  Burgers’  equation  decomposition  parameter. 

Our  remaining  discussions  in  this  section  concern  work  completed  on  the  small-scale 
equations  between  1  Jan  1990  and  31  Mar  1990.  In  our  previous  report  [7],  we  defined  a 
series  of  spectral  basis  functions,  based  on  the  Legendre  polynomials,  and  identified  a  non¬ 
iterative  splitting  procedure  for  integrating  the  small-scale  equations.  The  basis  functions 
were  defined  such  that  all  but  two  of  the  functions  in  a  basis  set  satisfied  homogeneous 
Dirichlet  boundary  conditions,  while  the  remaining  two  in  combination  satisfied  arbitrary 
inhomogeneous  Dirichlet  boundary  conditions.  These  functions  give  rise  to  a  method  which 
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is  a  hybrid  of  a  Galerkin  method  (in  which  all  the  basis  functions  satisfy  the  boundary 
conditions)  and  a  tau  method  (in  which  the  coefficients  of  the  highest  two  modes  are  chosen 
after  the  rest  of  the  coefficients  are  evaluated,  such  that  the  boundary  conditions  are 
satisfied).  The  splitting  method  we  proposed  required  only  Dirichlet  conditions  on  the 
velocity,  which  could  be  easily  met  by  the  functions  in  the  basis  set 

The  basis  functions  and  integration  method  were  selected  with  an  eye  to  the  efficiency 
of  the  overall  algorithm.  A  potentially  time-consuming  operation  is  the  matching  of  the 
small-scale  solutions  at  the  boundaries  between  adjacent  cells,  since  this  requires  information 
on  the  small-scale  boundary  conditions  throughout  the  global  (large-scale)  domain.  The 
inhomogeneous  boundary  information  was  restricted  to  just  two  functions  in  the  basis  set  in 
order  to  minimize  the  arithmetic  involved  in  matching  the  boundary  solutions.  Similarly, 
because  the  small-scale  equations  must  be  solved  repeatedly  in  each  of  many  small-scale 
cells,  we  chose  the  non-iterative  splitting  method  for  the  integration  in  order  to  avoid  the 
extra  arithmetic  of  an  iterative  method.  Note,  however,  that  even  though  the  splitting  method 
is  nominally  non-iterative,  some  iteration  is  required  for  the  accurate  evaluation  of  the  non¬ 
linear  terms. 

Further  examination  of  the  small-scale  problem,  with  the  continuing  goal  of 
minimizing  the  overall  arithmetic,  led  us  to  a  re-evaluation  of  the  basis  functions  and 
integration  algorithm  initially  selected.  This  resulted  in  the  identification  of  a  more  efficient 
integration  algorithm  and  a  different  series  of  basis  functions.  These  are  described  below, 
along  with  the  motivation  for  their  selection. 

Our  reconsideration  of  the  basis  functions  was  prompted  by  our  desire  to  minimize  the 
arithmetic  associated  with  the  evaluation  of  the  nonlinear  terms  in  the  momentum  equation. 
The  evaluation  of  the  (nonlinear)  quadratic  terms  in  a  spectral  representation  of  the  Navier- 
Stokes  equations  must  be  carried  out  via  the  process  of  convolution.  This  is  a  crucial  step, 
and  the  efficiency  of  the  algorithm  may  depend  on  the  efficiency  with  which  the 
convolutions  are  evaluated.  The  most  efficient  methods  for  the  evaluation  of  spectral 
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convolutions  make  use  of  so-called  “fast”  spectral  transforms.  Well-known  fast  transforms 
exist  for  the  trigonometric  (Fourier)  polynomials,  and  for  the  Chebychev  polynomials.  A  fast 
transform  also  exists  for  the  Legendre  polynomials  [9],  but  it  is  more  difficult  and  less  well 
known  than  the  Fast  Fourier  Transform  (FFT).  Furthermore,  all  these  fast  transforms 
manipulate  the  coefficients  of  the  weighted  sums  of  the  individual  spectral  functions  on 
which  the  transform  is  based.  Additional  manipulations  --  and  additional  arithmetic  -  are 
necessary  if  a  Galerkin  or  Galerkin-like  method  has  been  used  in  the  spectral  discretization  of 
the  original  differential  equation,  i.e.,  when  the  spectral  functions  are  used  in  combinations 
such  as  those  we  had  earlier  proposed,  rather  than  being  used  individually. 

These  facts  led  us  to  consider  the  use  of  Chebychev  polynomials  and  a  pure  tau 
method.  The  use  of  a  Chebychev  tau  method  permits  the  convolutions  to  be  carried  out  by 
means  of  Fast  Chebychev  Transforms  (FCTs),  without  the  additional  manipulations  that  a 
Galerkin  method  would  require.  This  requires  0((N-2)log(N))  arithmetic  in  two  dimensions, 
and  0((N-3)log(N))  in  three.  We  then  looked  for  a  method  of  solving  the  equations  as  a 
whole  which  demanded  a  comparable  level  of  arithmetic. 

In  Section  5.1.3  of  their  book,  Canuto,  Hussaini,  Quarteroni,  and  Zang  [10]  give 
several  efficient  methods  for  the  Chebychev  tau  solution  of  Poisson  equations.  We  can  use 
these  methods  provided  that  we  can  manipulate  the  discrete  Navier-Stokes  equations  into 
Poisson  form.  The  derivation  and  use  of  a  pressure  Poisson  equation  to  replace  the  mass 
conservation  equation  is  well  known,  but  due  care  must  be  exercised  in  the  imposition  of  the 
boundary  conditions  (cf.  Gresho  and  Sani  [11]).  It  is  also  possible  to  put  the  discrete 
momentum  equations  in  Poisson  form  if  we  use  implicit  Crank-Nicolson  to  time-discretize 
the  linear  terms,  but  use  a  predictor-corrector  approach  in  the  time-discretization  of  the 
nonlinear  and  pressure  gradient  terms.  An  explicit  treatment  of  these  terms  would  also  work, 
but  a  predictor-corrector  approach  is  preferred  to  a  simple  explicit  treatment  because  the 
corrector  can  be  iterated  an  arbitrary  number  of  times  in  order  to  improve  the  stability  of  the 
integration.  With  the  equations  in  this  form,  the  matrix  diagonalization  approach  described 
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by  Canuto,  et  al.  [10],  may  be  used  to  solve  the  entire  system  in  0(N)  arithmetic  in  both  two 
and  three  dimensions.  In  this  way,  the  solution  of  the  system  of  equations  is  made  almost  as 
efficient  as  the  evaluation  of  the  convolutions.  We  also  note  that  the  use  of  a  predictor- 
corrector  approach  extends  straightforwardly  to  multi-step  methods  of  higher  time-accuracy. 

As  noted  above,  we  had  initially  considered  our  composite  Legendre-based  basis 
functions  in  order  to  facilitate  the  matching  of  the  small-scale  solutions  at  the  subdomain 
(cell)  boundaries.  The  adoption  of  the  Poisson  equation  approach,  however,  also  facilitates 
this  operation  by  opening  up  the  extensive  literature  on  the  solution  of  the  Poisson  equation 
by  means  of  domain  decomposition,  which  necessarily  considers  the  problem  of  solution¬ 
matching  at  the  subdomain  boundaries.  Additionally,  the  boundary  conditions  required  by 
the  pressure  Poisson  equation  [10]  suggest  that  C2-continuity  of  the  velocity  should 
automatically  guarantee  C1 -continuity  of  the  pressure,  and  in  light  of  the  pressure  Poisson 
equation  this  should  guarantee  global  mass  conservation.  We  shall  be  examining  this  further 
in  terms  of  its  relation  to  the  theory  of  domain  decomposition,  the  requirements  for  accurate 
matching  of  the  small-scale  solutions,  and  the  connection  between  the  small  and  large  scales. 

In  summary,  therefore,  the  new  approach  shares  with  the  original  method  the 
characteristics  of  facilitating  the  matching  of  solutions  at  the  small-scale  subdomain 
boundaries  and  using  an  efficient  routine  for  the  small-scale  integrations,  but  in  addition  it 
further  minimizes  the  overall  arithmetic  and  provides  a  stronger  connection  to  the  theory  of 
domain  decomposition. 

We  have  defined  a  test  problem  for  the  small-scale  algorithm:  it  is  a  single  2-D  small- 
scale  cell  with  C1 -periodic  boundary  conditions,  and  with  an  imposed  constant  large-scale 
shear.  This  is  a  single-parameter  problem,  with  the  one  arbitrary  parameter  being  the 
Reynolds  number  defined  in  terms  of  the  cell  height  and  the  large-scale  velocity  gradient 
This  problem  will  be  run  at  a  number  of  different  Reynolds  numbers  in  order  to  map  out  the 
parameter  space  of  qualitative  behavior  of  solutions  in  a  treatment  similar  to  that  of 
McDonough  and  Bywater  [3,4].  Coding  of  this  problem  is  in  progress.  Because  of  our 
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choice  of  algorithms,  this  can  be  directly  extended  to  the  3-D  case,  and  in  this  context  will 
simply  be  a  callable  subroutine  in  the  overall  small-scale  solution  process. 


4.  DOMAIN  DECOMPOSITION 

There  has  been  very  significant  progress  in  the  domain  decomposition  work  during  the 
period  1  Jan  1990  to  31  Mar  1990.  We  will  briefly  describe  the  main  parts  of  this  in  the 
present  section  and  leave  details  to  Appendix  A.  The  viewpoint  taken  is  somewhat  different 
from  that  of  the  work  of  McDonough  and  By  water  [3-5].  In  particular,  ATD  will  here  be 
analyzed  in  the  context  of  a  two-grid  multigrid  method  with  the  large-scale  corresponding  to 
the  coarse  grid,  and  the  small  scale  associated  with  fine  grid  calculations.  Because  of  the 
nonlinearity  of  Burgers’  and  the  N.-S.  equations,  we  have  employed  the  so-called  full 
approximation  scheme  (FAS)  in  the  present  work.  It  is  shown  in  Appendix  A  that  with  this 
approach,  the  differential  equation(s)  remains  the  same  on  all  scales  (as  with  standard  ATD) 
but  requires  a  forcing  term  that  is  different  for  different  scales. 

Two  particular  implementations  employing  domain  decomposition  have  been  studied 
for  Burgers’  equation:  1)  the  Schwarz  alternating  method,  and  2)  a  Gauss-Seidel-Newton 
method.  These  have  been  run  in  a  sequential  manner  to  date,  but  both  are  easily  parallelized. 
We  now  list  the  main  results.  First,  for  the  Schwarz  alternating  method  it  was  found  that 
convergence  of  the  small-scale  recoupling  iterations  (to  the  level  of  discretization  error) 
could  be  achieved  in  a  single  iteration  provided  a  sufficient  number  of  modes  was  carried  on 
each  of  the  small-scale  subdomains.  Conversely,  if  too  few  modes  were  used,  convergence 
occurred  only  to  a  certain  level  (determined  by  the  discretization  error).  Moreover,  it  was 
found  that  the  coarse  grid  results  were  not  even  needed  for  the  model  problems.  It  should  be 
noted  that  this  last  result  is  not  expected  to  hold  for  a  real  physical  flow  field  simulation.  On 
the  other  hand,  the  remaining  observations  show  that  this  approach  holds  much  promise. 

Results  for  the  Gauss-Seidel-Newton  studies  indicate  that  this  approach  is  not  as  robust 
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as  is  the  Schwarz  method.  Its  performance  is  somewhat  more  sensitive  to  time  step  size,  and 
in  general  more  iterations  are  required  per  time  step.  But  the  arithmetic  per  iteration  is  much 
less  than  for  the  Schwarz  method. 

Further  details  of  these  studies,  and  the  conclusions  that  can  be  drawn  from  them,  are 
provided  in  Appendix  A. 

5.  SUMMARY 

In  summarizing  the  work  on  this  project  we  note  that  all  of  the  proposed  work  for 
Burgers’  equation  has  been  completed.  In  addition,  another  major  project  not  originally 
proposed  was  undertaken  and  carried  to  conclusion.  One  Masters  Thesis,  Peng  [8],  and  one 
conference  presentation,  Peng  and  McDonough  [12],  have  resulted  from  this  part  of  the 
research;  also,  two  journal  articles  are  in  progress. 

The  work  associated  with  the  3-D  N.-S.  ATD  implementations  is  not  yet  complete,  but 
good  progress  has  been  made  particularly  with  regard  to  theoretical  aspects.  One  conference 
presentation  has  been  made,  Hylin  and  McDonough  [13],  on  this  portion  of  the  research. 
Furthermore,  the  work  has  proceeded  continuously  beyond  the  31  Mar  1990  project  ending 
date. 

The  domain  decomposition  studies,  although  begun  somewhat  later  than  hoped,  have 
by  now  produced  significant  useful  results.  As  mentioned  in  Sec.  4,  these  have  been  along 
somewhat  different  lines  than  had  originally  been  expected,  but  are  nevertheless  valuable  in 
understanding  the  best  approaches  to  follow  in  ATD  implementations. 
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REPORT  ON 

Studies  of  the  Additive  Turbulent  Decomposition  Method. 
by  Tony  F.  Chan  and  Tarek  Mathew 
Department  of  Mathematics,  UCLA. 

1  Introduction. 

The  Additive  Turbulent  Decomposition  method  (ATD)  is  a  numerical  technique  under 
study  and  development  for  the  purpose  of  computing  qualitatively  accurate  approx¬ 
imations  to  turbulent  solutions  of  the  incompressible  Navier-Stokes  equations.  It  is 
described  in  McDonough  and  By  water  [22],  [23].  In  this  document,  we  report  on  our 
studies  of  the  domain  decomposition  aspects  of  the  ATD  method.  This  was  done 
during  the  grant  period  this  year  (until  March  1990). 

The  rest  of  this  document  is  outlined  as  follows.  First,  we  briefly  describe  the 
original  version  of  the  ATD  method.  This  involves  the  use  of  a  coarse  grid  model  and  a 
fine  grid  model  for  the  discretisation  of  the  problem.  We  then  describe  the  framework 
in  which  we  have  studied  the  ATD  method,  namely  as  a  two  level  multigrid-domain 
decomposition  type  method.  We  also  include  a  discussion  of  the  work  by  Henshaw, 
Kreiss  and  Reyna  [17]  on  the  smallest  scale  for  the  incompressible  Navier-Stokes 
equations  which  has  some  implications  on  the  selection  of  the  discretisation  used  in 
the  ATD  method.  The  original  ATD  method  involved  a  coupling  between  the  coarse 
grid  and  fine  grid  model  equations.  Our  studies  have  led  us  to  incorporate  another 
technique  to  couple  these  equations,  namely,  the  Full  Approximation  Scheme  (which 
is  frequently  used  in  nonlinear  multigrid  methods).  We  describe  this  technique  and 
also  some  domain  decomposition  algorithms  that  are  used  in  conjunction  with  this. 
We  then  present  numerical  results  of  tests  done  using  these  methods  on  a  model 
problem.  Following  that  we  discuss  briefly  the  extensions  of  these  methods  to  the 
case  of  the  Stokes  problem  and  incompressible  Navier-Stokes  equations.  Finally,  we 
describe  some  work  we  plan  to  do  next  year. 
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2  The  ATD  method. 


Here  we  briefly  recall  the  original  version  of  the  ATD  method  of  McDonough  and 
By  water  [22],  [23].  Consider  an  evolution  equation: 

Ut  +  L(U)  =  F,  (1) 

for  a  nonlinear  elliptic  operator  L(.),  forcing  F,  and  appropriate  boundary  conditions 
and  initial  data.  The  solution  U  is  decomposed,  in  some  suitable  way,  as  the  sum  of  a 
large-scale  component  Ui  (representing  the  large-scale  features  of  the  solution)  and  a 
small-scale  component  U ,  (representing  the  small-scale,  local  features  of  the  solution) 
evolving  on  a  faster  time  scale: 

u  =  u,  +  us. 

Using  this,  we  can  rewrite  (1)  as  a  pair  of  equations  for  Ua  and  Ur. 

Zfr  +  hiU'  +  Ut)  =  F. 

f L  +  L2(Ut  +  U,)  =  Ft  U 

where  Li(.)  and  L2{.)  are  nonlinear  operators  that  sum  to  L(.)  (see  [22],  [23]  for 
their  definition),  and  F  =  F3  +  Ft.  The  equation  for  the  small-scale  component  U, 
is  posed  locally  on  a  partition  of  the  domain  and  is  discretised  locally  on  each  of  the 
subregions  using  a  suitable  method.  The  equation  for  the  large-scale  component  Ui  is 
discretised  on  a  coarse  grid.  Since  the  two  equations  in  (2)  are  coupled,  an  iterative 
procedure  must  be  used  to  solve  them,  e.g.,  by  alternatingly  freezing  one  component 
while  updating  the  other.  Of  course,  the  splitting  L  =  L\  4-  L2  must  be  chosen  so 
that  we  obtain  a  convergent  method  and  so  that  the  interaction  between  the  large- 
scale  and  small-scale  components  of  the  solution  are  well  represented.  On  this  issue, 
we  were  led  to  consider  a  frequently  used  procedure  in  nonlinear  multigrid  methods, 
known  as  the  full  approximation  scheme.  We  describe  this  in  a  later  section. 

3  A  framework  for  our  studies. 

The  ATD  method  represents  a  remodelling  of  the  discretisation  process  of  the  dif¬ 
ferential  equations,  as  well  as  a  divide  and  conquer  type  strategy  to  compute  the 
solution  of  the  discrete  equations.  The  discretisation  involves  two  grids,  a  fine  grid 
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and  a  coarse  grid,  and  the  iteration  involves  the  solution  of  subproblems  on  subdo¬ 
mains.  In  the  case  of  direct  simulations  of  turbulence,  the  fine  grid  problem  is  solved 
exactly,  to  obtain  the  fine  grid  solution  Uh,  where  h  denotes  the  size  of  the  fine  grid. 
The  choice  of  h  must  be  based  on  the  smallest  scale  of  the  flow:  h  <  hmin,  which  is 
discussed  in  Henshaw,  Kreiss  and  Reyna  [17].  There  they  prove  that  the  minimum 
scale  associated  with  two  and  three  dimensional  turbulent  flows  of  the  incompressible 
Navier-Stokes  equations,  is  inversely  proportional  to  the  square  root  of  the  Reynolds 
number  based  on  the  kinematic  viscosity  and  the  maximum  of  the  velocity  gradients. 
Thus,  for  large  Reynolds  numbers,  the  solution  of  the  fine  grid  problem  may  be  pro¬ 
hibitively  expensive.  Furthermore,  solving  a  coarse  model  problem  for  Uh,  though  of 
moderate  expense,  may  not  be  accurate  as  mentioned  above.  In  the  context  of  other 
methods  to  simulate  or  model  turbulence,  such  as  Large  Eddy  Simulation  (LES),  see 
[10],  [11],  the  ATD  method  was  initiated  with  the  goal  of  computing  an  approxima¬ 
tion  Uh,H  to  the  fine  grid  solution  Uh,  based  on  the  fine  grid,  at  a  cost  considerably 
less  than  direct  simulation,  and  more  accurate  than  the  coarse  grid  solution  Uh- 

These  considerations  led  us  to  formulate  the  ATD  method  in  the  framework  of 
a  two-level  multigrid  method,  in  which  the  smoothing  (a  procedure  which  efficiently 
captures  the  small-scale  or  high  frequency  local  features  of  the  solution)  is  done  by 
a  domain  decomposition  method.  And  in  accordance  with  our  framework,  we  will 
use  terms  such  as  small-scale  and  fine  grid  interchangeably,  as  well  as  large-scale  and 
coarse  grid. 

However,  this  point  of  view  needs  modification  (though  we  have  not  focused  on 
this  so  far),  since  unlike  domain  decomposition  methods  for  stationary  problems, 
the  presence  of  two  spatial  scales  which  may  evolve  at  different  time  scales  adds 
an  extra  structure  to  the  problem.  For  instance,  it  is  observed  that  following  an 
initial  time  range  during  which  the  flow  is  highly  random  (maximally  dissipative), 
the  flow  organises  itself  into  coherent  structures  (mature  flows)  with  some  noticeable 
large-scale  features.  The  large-scale  features  may  evolve  at  a  slower  time  scale  than 
the  small  scale  features.  Thus,  to  increase  the  efficiency  of  a  numerical  technique 
simulating  turbulence,  the  technique  should  be  adapted  to  the  regime  of  the  flow.  We 
hope  to  address  this  issue  in  our  future  studies  making  use  of  quantitative  bounds  for 
the  minimum  scale  based  on  maximum  velocity  gradients  (as  presented  in  [17]),  to 
adaptively  coarsen  or  refine  the  mesh.  We  also  intend  to  test  the  use  of  different  time 
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steps  corresponding  to  different  spatial  scales  in  the  framework  for  the  ATD  method, 
as  such  techniques  have  been  used  in  some  multigrid  CFD  codes  for  compressible 
flows,  see  Jameson,  Schmidt  and  Turkel  [18].  We  hope  to  explore  this  in  the  context 
of  the  incompressible  Navier-Stokes  equations. 

In  the  case  when  more  explicit  information  about  the  nature  of  the  flow  is  known, 
such  as  when  the  flow  is  locally  homogeneous,  then  it  may  be  possible  to  solve  exactly 
for  just  one  of  the  subregions  to  form  a  local  subgrid  turbulence  model  and  use  this 
information  to  form  a  composite  small-scale  model  on  the  whole  domain,  with  a  cost 
that  is  considerably  reduced.  Of  course,  a  coarse  model  would  be  needed  in  addition, 
to  capture  the  large-scale  features  of  the  flow.  Such  features  can  be  tested  numerically, 
in  the  corresponding  regimes  for  the  flow. 

Another  issue  in  constructing  a  framework  for  the  ATD  method,  is  on  the  choice 
of  discretisation.  Results  of  numerical  experiments,  presented  in  Browning  and  Kreiss 
[5],  on  the  simulation  of  turbulence  for  the  incompressible  Navier-Stokes  equations, 
indicate  that  the  use  of  some  lower  order  finite  difference  schemes  in  space  with  the 
same  number  of  nodes  as  a  Fourier  pseudo-spectral  method,  integrated  in  time  using  a 
fourth  order  predictor-corrector  scheme,  remained  accurate  for  short  time  ranges  but 
became  inaccurate  after  long  time  integrations,  whereas  higher  order  finite  difference 
schemes  and  the  pseudo-spectral  method  remained  accurate.  Such  results  indicate 
the  importance  of  the  selection  of  discretisations  in  the  ATD  method. 

4  The  Full  Approximation  Scheme. 

As  mentioned  earlier,  the  goal  of  the  ATD  method  is  to  efficiently  compute  an  approx¬ 
imation  to  the  fine  grid  solution  £4  based  on  the  coarse  grid  and  fine  grid  equations. 
The  coupling  between  the  two  scales  is  represented  in  equation  (2).  However,  an 
optimal  choice  for  the  forcing  terms  or  for  the  splitting  L  =  L1+L2  to  couple  the  two 
scales,  is  not  clear.  This  led  us  to  incorporate  into  the  ATD  framework  a  technique 
commonly  used  in  nonlinear  multigrid  methods  to  couple  the  fine  grid  and  coarse 
grid  problems,  namely,  the  Full  Approximation  Scheme  (FAS).  The  FAS  has  some  of 
the  above  desired  features,  see  Brandt  [3]  and  Hackbusch  [16].  We  briefly  describe 
it  here  for  a  stationary  problem,  though  it  holds  virtually  without  change  for  time 
dependent  problems,  when  an  implicit  time  stepping  method  is  used. 
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Assume  that  the  nonlinear  equation  L(U )  =  F  is  discretised  on  two  grids,  a 
fine  grid  Lh(Uh)  =  Fh  and  a  coarse  grid  LH(UH)  =  FH .  We  use  Ifa  to  denote  an 
interpolation  map  from  the  coarse  grid  to  the  fine  grid,  and  Iff  to  denote  a  restriciton 
map  (possibly  the  adjoint  map  of  Ifo)  from  the  fine  grid  to  the  coarse  grid.  Given 
an  inexpensive  procedure  for  computing  an  approximate  solution  Vh  to  the  fine  grid 
equations,  which  captures  the  local  features  of  the  solution  (by  a  suitable  procedure, 
often  referred  to  as  smoothinq),  we  solve  for  the  correction  Uh  —  Vh  on  the  coarse 
grid,  in  an  indirect  way.  We  solve  for  an  unknown  W^ull  on  the  coarse  grid  by  exactly 
solving 

£"(«%>)  =  L”VHvk)  +  IhlFh  -  £*(V‘)]. 

Here  W^n  represents  an  approximation  to  IffUk,  and  is  therefore  called  a  full  approx¬ 
imation  on  the  coarse  grid.  The  solution  on  the  fine  grid  is  then  updated  or  improved 
by  modifying  it  as: 

KL «—  V*  +  Ibiwfa  - 

Following  this,  the  smoothing  procedure  can  be  repeated  once  again  on  the  fine  grid, 
to  produce  an  update. 

Note  that  the  nonlinear  equations  remains  the  same  on  both  the  coarse  and  fine 
grids,  for  all  iterations,  while  the  forcing  changes.  Furthermore,  heuristic  reasons  can 
be  given  to  show  that  an  appropriate  forcing  for  the  large-scale  equations  is  the  one 
given  above,  see  [3],  [16].  Finally,  we  mention  that  if  the  restriction  map  is  chosen 
to  be  the  orthogonal  projection  onto  the  coarse  grid,  then  upon  convergence,  the  full 
approximation  Wfijf  becomes  the  orthogonal  projection  of  the  exact  fine  grid  solution 
onto  the  coarse  grid,  even  for  nonlinear  equations. 

5  Solution  of  small-scale  equations. 

In  this  section,  we  discuss  how  approximate  solutions  to  the  fine  grid  problem  are 
obtained  using  two  types  of  smoothers  (procedures  which  capture  the  local  or  small- 
scale  high  frequency  features  of  the  solution).  We  describe  a  domain  decomposition 
method  known  as  the  Schwarz  alternating  method  (and  also  mention  a  variant  of  it  for 
parabolic  problems  due  to  Kutznetsov)  as  well  as  a  nonlinear  Gauss-Seidel-Newton 
method.  Both  these  are,  of  course,  used  in  conjunction  with  the  FAS  on  the  coarse 
grid  to  obtain  an  approximation  to  both  the  small-scale  and  large-scale  features  of 
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the  solution.  The  use  of  domain  decomposition  usually  leads  to  easily  parallelisable 
algorithms.  In  addition,  the  Schwarz  alternating  method  is  also  applicable  to  the 
Stokes  equations  (in  which  case  convergence  proofs  are  known,  see  Lions  [20]  and 
Mathew  [21]),  as  well  as  to  the  Navier-Stokes  equations. 

5.1  The  Schwarz  alternating  method. 

We  now  describe  the  Schwarz  alternating  method  in  the  case  of  an  elliptic  problem.  Its 
generalisation  to  the  case  of  discrete  versions  of  elliptic  problems  and  to  the  implicit 
time  discretisations  of  parabolic  problems  is  immediate.  Consider  the  elliptic  problem: 

L(U)  =  F  in  ft, 

with  appropriate  boundary  conditions  on  dCl,  where  L(.)  is  a  nonlinear  elliptic  oper¬ 
ator,  and  fi  is  the  domain  of  the  problem.  Let  f l  be  partitioned  into  ns  overlapping 
subdomains  {fi,}: 

n  =  U£aO,. 

Given  an  approximation  Ui  to  U  satisfying  the  boundary  conditions,  we  construct  a 
new  approximation  t/,+1  in  ns  fractional  steps  as  follows.  For  j  =  1  to  ns,  solve 

L(Wj)  =  F  in  ft,, 

with 

Wj  Ian,  =  C^lan,, 

and  define 

=  Wj  in  ft,, 

Ui+is  =  in  ft  -  ft,  . 

The  method  presented  above  (which  is  sort  of  a  block  Gauss-Seidel  method)  is 
the  basic  version  of  the  Schwarz  alternating  method,  and  it  is  sequential  as  we  have 
described  it.  Note  that  no  global  linearisations  of  the  nonlinear  equations  is  required. 
In  the  case  of  linear  problems,  elliptic  or  parabolic,  there  is  a  highly  parallelisable  ver¬ 
sion  of  this  algorithm  (block  Jacobi),  which  can  also  be  applied  to  global  linearisations 
of  nonlinear  problems,  see  Dryja  and  Widlund  [9],  Cai  [6]. 

However,  even  the  Gauss-Seidel  version  is  easily  parallelisable  via  the  technique 
of  multicoloring.  We  briefly  describe  this  with  an  example.  Consider  the  case  where 
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fi  is  the  unit  square.  To  obtain  {()_,},  we  first  partition  0  into  a  mesh  of  equally 
sized  subsquares  {fij}  without  overlap.  Following  that,  we  enlarge  each  square  to  a 
rectangle  so  that  an  overlapping  collection  {Qj}  is  formed.  Then,  the  subdomains 
{Q'  }  are  grouped  into  four  colors,  each  color  consisting  of  disjoint  subdomains  on 
which  local  problems  can  be  solved  in  parallel.  Thus,  the  number  of  sequential  steps 
in  this  algorithm  can  be  reduced  to  four.  See  Figure  1. 

For  linear  elliptic  problems,  the  algorithm  can  be  shown  to  have  a  geometric  rate 
of  convergence  for  suitable  discretisations,  see  Lions  [20],  Dryja  and  Widlund  [9], 
Bramble,  Pasciak,  Wang  and  Xu  [2],  and  Mathew  [21],  i.e., 

\\U  -  i/‘+1||  <  p'\\U  -  ^°||, 

for  a  constant  p  <  1  and  independent  of  h.  However,  as  the  number  of  subdomains  ns 
increases,  the  rate  of  convergence  of  the  iteration  decreases,  due  to  the  global  domain 
of  dependency  of  elliptic  problems.  This  deterioration  in  the  rate  of  convergence  as 
the  number  of  subdomains  is  increased,  is  usually  eliminated  when  the  algorithm  is 
used  with  a  coarse  grid  in  the  FAS  context.  Numerical  results  are  presented  for  a 
model  problem,  in  the  next  section. 

5.2  Nonlinear  Gauss-Seidel  Newton  method. 

Here  we  briefly  describe  the  Gauss-Seidel-Newton  method,  which  is  commonly  used 
as  a  smoother  for  multigrid  methods  for  nonlinear  elliptic  problems,  see  Brandt  [3] 
and  Hackbusch  [16].  It  is,  in  a  way,  a  limiting  case  of  the  Schwarz  alternating  method, 
when  each  subdomain  is  chosen  to  contain  just  one  interior  unknown.  Let  M{U)  =  F 
denote  a  system  of  N  nonlinear  equations  for  N  real  unknowns  U  =  («i, . . . ,  un). 
Given  an  approximate  solution  V'  ss  U,  we  construct  the  next  approximate  solution 
P‘+1  in  N  fractional  steps.  During  each  fractional  step,  just  one  unknown  is  updated 
by  solving  one  of  the  equations  in  which  all  other  unknowns  are  frozen.  Since  each 
equation  is  nonlinear,  each  fractional  step  may  involve  the  solution  of  one  nonlinear 
equation  in  one  unknown,  say  by  Newton’s  method.  More  generally,  just  one  step 
of  the  Newton  iteration  can  be  used.  This  is  just  a  generalisation  of  the  standard 
Gauss-Seidel  method  for  symmetric  positive  definite  linear  systems,  to  the  case  of 
nonlinear  systems.  An  advantage  of  the  Gauss-Seidel-Newton  method  is  that  no 
global  linearisation  of  M(U)  is  required,  rather  only  linearisations  of  scalar  equations. 
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5.3  An  additive  Schwarz  type  method  of  Kutznetsov. 

The  time  dependent  nature  of  our  problem  can  also  be  exploited  to  derive  more 
efficient  domain  decomposition  smoothers.  Here  we  briefly  describe  such  an  algorithm, 
requiring  no  coarse  model,  for  linear  parabolic  problems,  due  to  Kutznetsov  [19],  see 
also  Meurant  [24].  Of  course,  this  algorithm  can  also  be  applied  to  global  linearisations 
of  nonlinear  problems.  Consider  equation  (1)  for  a  linear  elliptic  operator  L(.).  If  an 
implicit  discretisation  in  time  is  used,  such  as  backward  Euler,  the  resulting  stationary 
problems  have  the  form: 

(I  +  AtL)Un+1  =  F  =  Un  +  AtFn+\  (3) 

where  At  is  the  time  step.  In  the  method  of  Kutznetsov,  the  domain  is  subdivided 
into  ns  nonoverlapping  subdomains  { fly } ,  and  the  forcing  F  is  decomposed  as  a  sum 
of  functions  {Fj}: 

F  =  Fx  +  ...  +  Fna, 

where  Fj  has  support  in  f lj  or  a  small  extension  of  f lj.  Following  that,  given  a  small 
positive  number  e,  say  on  the  order  of  At  or  (A t^2  ..  ,h  subdomain  f lj  is  enlarged 

to  f V-  by  an  amount  proportional  to  y/At  loge,  and  ns  subproblems  axe  solved  in 
parallel  on  each  subdomain  Q'  with  zero  boundary  conditions: 

(/  +  AtL)Wj  =  P)  in  O', 

with  Wj\ao'}  =  0.  Then,  an  approximate  solution  to  equation  (3)  is  obtained  by  adding 
the  { VVy } .  It  can  be  shown  that 

||(VFj  -4-  .  .  .  +  Wna)  —  Un+1 1|  =  0(e), 

by  using  the  decay  property  of  the  Green’s  funtion  for  this  problem,  see  Kutznetsov 
[19]. 


5.4  Other  methods. 

In  our  studies,  we  also  considered  other  domain  decomposition  methods.  In  par¬ 
ticular,  we  considered  the  patching  method,  see  Canuto,  Hussaini,  Quarteroni  and 
Zang  [7].  However,  the  technique  to  patch  together  the  subdomain  solutions  is  quite 
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complicated,  especially  for  two  or  three  dimensional  domains,  and  so  we  preferred  us¬ 
ing  the  Schwarz  alternating  method  and  the  nonlinear  Gauss-Seidel  method  instead. 
More  varieties  of  domain  decomposition  algorithms  exist,  see  Glowinski  et  al  [13]  and 
Chan  et  al  [8]. 

6  Numerical  tests  on  a  model  problem. 

In  this  section,  we  report  on  the  results  of  some  numerical  tests  done  on  a  model 
problem,  namely,  the  viscous  Burgers’  equation  for  a  large  parameter  Re. 

[  Ut  4-  UUX  =  jfeUxx  "H  fill x)  Vx  £  (— 1,1);  t  >  0 

{  U(t  =  0,x)  =  U0(x)  Vx  €  (—1, 1)  (4) 

[  U(t,-1)  =  gL(t),  U(t,+\)  =  gR(t)  Vt  >  0. 

In  our  previous  notation  the  above  problem  would  read 

Ut  +  L(U)  =  F, 

where  the  nonlinear  elliptic  operator  L(.)  is 

L(U)  =  j±Uxx  +  UUX. 

We  discretised  Burgers’  equation  by  using  backward  Euler  (and  occasionally  Crank- 
Nicolson)  in  time  and  Chebychev  collocation  in  space,  for  each  of  the  subdomains 
and  for  the  coarse  model.  In  all  of  the  tests,  we  present  the  results  for  the  solution 
of  the  implicit  equations 

U  +  A  tL{U)  =  F,  (5) 

which  occurs  at  every  time  step  (similar  equation  for  Crank-Nicolson).  This  was  done 
to  study  the  efficiency  of  the  various  methods  foi  a  single  time  step.  Exact  solutions 
were  chosen  with  a  sufficiently  large  number  of  oscillations. 

We  compute  the  relative  error  using  the  exact  solution  to  the  stationary  prob¬ 
lem  (5).  In  the  tables,  we  list  the  number  of  iterations  required  to  decrease  the  rel¬ 
ative  error  in  the  computed  solution  by  a  factor  At/10  (since  we  used  the  backward 
Euler  method  which  is  accurate  to  order  one  in  At). 

We  present  numerical  results  for: 

1.  The  ATD  method  based  on  the  Schwarz  alternating  method  (in  the  context  of 
the  FAS),  see  table  1.  Exact  solvers  were  used  for  the  subdomain  problems. 
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2.  The  ATD  method  based  on  the  nonlinear  Gauss-Seidel-Newton  method  (in  the 
context  of  the  FAS),  see  table  2. 

3.  Kutznetsov’s  method  for  a  convection-diffusion  problem,  see  table  3. 

We  now  mention  some  details  about  the  choice  of  the  mesh  parameters.  For  more 
details,  see  Gottlieb  and  Orszag  [14],  and  Canuto,  Hussaini,  Quarteroni  and  Zang 
[7].  The  number  of  modes  used  in  the  Chebychev  collocation  method  was  based 
upon  stability  considerations,  since  for  large  Reynolds  numbers,  Re  1,  there  can 
be  development  of  boundary  layers,  and  this  can  cause  instability  in  the  spectral 
discretisation  unless  the  number  of  modes  used,  N ,  is  sufficiently  large  (see  Gottlieb 
and  Orszag  [14],  page  140).  The  restriction  is  that  N  >  Ncruicah  where 

N critical  ~  3 \Jlie  || f^||mar- 


For  large  Reynolds  numbers,  the  coarse  model  can  become  unstable  if  a  small  number 
of  modes  is  chosen.  In  such  cases,  we  used  artifical  viscosity,  i.e.,  decreased  the 
Reynolds  number  on  the  coarse  model,  to  obtain  stable  discretisations. 

Our  choice  of  the  time  step  was  based  on  the  scale  of  changes  of  the  flow  on  the 
computational  grid.  This  was  roughly  chosen  to  correspond  to  the  CFL  condition  for 
the  hyperbolic  equation 

Ut  +  Ux  =  0, 


which  is  a  model  for 


Ut  +  UUX  =  0, 


(which  is  obtained  as  Re  — ►  oo).  For  Chebychev  collocation  methods,  this  is  given 
by: 

(6) 

for  some  positive  constant  c  (which  we  set  as  1  for  our  tests).  In  some  of  our  tests, 
we  also  used  A t  =  jj  and  At  =  We  contrast  this  with  the  time  step  restriction 
for  explicit  schemes: 


At  < 


N*' 


since  the  eigenvalues  of  the  discrete  Chebychev  collocation  approximation  to  elliptic 
operators  in  one  dimension  have  modulus  on  the  order  of  N 4,  where  N  is  the  number 
of  modes  used,  see  [7]. 
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Table  1:  ATD  USING  SCHWARZ  METHOD. 


At 

Re 

SMOOTHINGS 

SUBDOMAINS 

ITERATIONS 

B1M 

1 

4 

4 

1 

um 

1 

4 

6 

1 

IBM 

1 

4 

8 

1 

IBM 

1 

4 

10 

1 

um 

1 

4 

12 

1 

um 

1 

4 

14 

1 

UIM 

1 

4 

16 

1 

real 

10 

4 

4 

i 

Dm 

10 

4 

6 

i 

Dm 

10 

4 

8 

i 

iwai 

10 

4 

10 

l 

■WBB 

10 

4 

12 

i 

VIM 

10 

4 

14 

i 

real 

10 

4 

16 

i 

Dm 

100 

2 

8 

i 

cm 

100 

4 

8 

i 

IBM 

100 

2 

16 

i 

■Wttl 

100 

4 

16 

i 

w»aa 

100 

2 

24 

l 

IBM 

100 

4 

24 

l 

■XfiSX 

100 

2 

32 

l 

Baa 

100 

4 

32 

l 

um 

1000 

2 

8 

l 

roaa 

1000 

4 

8 

l 

D2Q1 

1000 

2 

16 

l 

ehbi 

1000 

4 

16 

l 

Dzai 

1000 

2 

24 

l 

um 

1000 

4 

24 

l 

um 

1000 

2 

32 

l 

um 

1000 

4 

32 

l 
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Table  2:  ATD  USING  GAUSS-SEIDEL-NEWTON  METHOD. 


N coarse 

At 

Re 

8 

mmm\ 

1 

8 

Mjmm 

1 

8 

l/N2 

1 

8 

mm 

1 

8 

Wltm 

1 

8 

■BIB 

10 

8 

wnm 

10 

8 

wnm 

10 

8 

mam 

100 

8 

warn 

100 

8 

wnm 

100 

8 

wnm 

100 

8 

l/N2 

100 

32 

wnm 

1000 

32 

wnm 

1000 

32 

■BIB 

1000 

32 

B20S 

1000 

32 

wnm 

1000 

8 

l/N 

100 

8 

l/N 

100 

8 

mm 

100 

8 

l/N 

100 

8 

l/N 

100 

32 

l/N 

1000 

32 

l/N 

1000 

32 

l/N 

1000 

32 

mm 

1000 

Re  I  SMOOTHINGS  I  ITERATIONS 
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Table  3:  KUTZNETSOV’S  METHOD. 


EB1 

At 

Re 

OVERLAP  N0 

CONVECTION  c 

ERROR 

32 

mum 

1 

1 

1.0 

1.3e-l 

32 

mwmm 

1 

5 

1.0 

l.le-3 

32 

mmm 

10 

1 

1.0 

1.0e-2 

32 

mmm 

10 

5 

1.0 

6.8e-4 

32 

WTRIM 

100 

1 

1.0 

4.2e-3 

32 

100 

5 

1.0 

6.3e-4 

128 

wannum 

1000 

1 

1.0 

2.0e-3 

128 

MTNIM 

1000 

5 

1.0 

3.1e-4 

32 

mw$m 

10 

1 

1.0 

4.5e-4 

32 

wmmm 

10 

5 

1.0 

l.le-5 

32 

wiKvmtm 

100 

1 

1.0 

5.9e-5 

32 

rniMwmM 

100 

5 

1.0 

6.8e-6 

128 

giiimai 

1000 

1 

1.0 

3.6e-5 

128 

tftonrc* 

1000 

5 

1.0 

4.0e-6 

32 

WTftIM 

100 

1 

0.5 

3.6e-3 

32 

mmm 

100 

5 

0.5 

3.2e-4 

128 

mum 

1000 

1 

0.5 

2.0e-3 

128 

mmm 

1000 

5 

0.5 

2.2e-4 

32 

mam 

100 

1 

0.0 

3.3e-3 

32 

mmm 

100 

5 

0.0 

8.0e-5 

128 

mssm 

1000 

1 

0.0 

2.0e-3 

1000 

5 

0.0 

1.8e-4 

We  now  briefly  discuss  the  results.  First,  we  consider  the  ATD  method  based 
on  the  Schwarz  method,  see  table  1.  In  this  case,  our  choice  of  subdomains  Qj  was 
non-overlapping  intervals  of  equal  length.  These  were  then  extended  halfway  into 
the  adjacent  subintervals  to  form  the  overlapping  collection  {  Q'} .  The  intervals  were 
grouped  into  two  colors,  providing  an  easily  parallelisable  algorithm.  We  found  that 
the  number  of  modes  used  locally,  in  each  subinterval  has  a  critical  effect  on  the  con¬ 
vergence  of  the  Schwarz  method,  i.e.,  if  the  number  of  modes  used  in  each  subinterval 
were  below  a  critical  number  (depending  on  the  smoothness  of  the  exact  solution), 
then  the  Schwarz  iteration  did  not  reduce  the  error  beyond  a  certain  point.  However, 
if  the  number  of  modes  used  locally  were  increased,  then  the  method  would  converge 
geometrically.  This  indicated  that  the  discretisation  error  prevented  convergence 
beyond  a  certain  point,  when  the  number  of  modes  used  locally  were  insufficient. 
In  addition,  we  have  found  that  for  the  choice  of  time  steps  we  used,  the  Schwarz 
method  converged  to  within  discretisation  error  in  one  iteration  (provided,  the  num¬ 
ber  of  modes  used  locally  were  sufficient)  and  that  the  coarse  model  was  not  needed. 
In  table  1  we  list  the  number  of  iterations  required  to  reduce  the  relative  error  of  the 
computed  solution  by  a  factor  of  A/10,  however,  it  was  observed  that  the  relative 
error  was  usually  reduced  by  a  factor  much  smaller  than  A/10.  We  have  also  listed 
the  number  of  smoothings  used,  i.e.,  the  number  of  times  the  Schwarz  iteration  was 
used  before  the  coarse  model  correction  was  done.  For  the  particular  choice  of  time 
steps,  the  rate  of  convergence  is  independent  of  the  number  of  subdomains  and  the 
Reynolds  number.  However,  as  At  — ♦  1,  the  rate  of  convergence  became  geometric 
and  was  much  slower  than  for  smaller  time  steps. 

The  next  group  of  tests  listed  are  for  the  ATD  method  based  on  the  nonlinear 
Gauss-Seidel-Newton  method,  which  is  a  limiting  case  of  the  Schwarz  method  applied 
to  individual  unknowns  rather  than  to  blocks  of  unknowns.  Each  scalar  nonlinear 
equation  was  solved  locally  by  using  a  Newton  iteration.  Here,  it  was  found  that 
for  large  Reynolds  numbers  the  iterations  sometimes  diverged  when  the  time  steps 
were  not  small  enough,  see  table  2.  Moreover,  as  the  time  step  grew  bigger,  the 
number  of  iterations  required  also  increased.  This  may  be  related  to  the  lack  of 
diagonal  dominance  in  the  resulting  system.  However,  for  the  time  steps  At  =  -fa 
and  At  =  j/jj,  the  number  of  iterations  required  to  reduce  the  initial  relative  error 
in  the  solution  by  a  factor  At/ 10  decreased  as  the  number  of  smoothing  iterations 
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were  increased.  The  computational  cost  for  one  iteration  of  this  version  of  the  ATD 
was  considerably  less  than  for  the  ATD  version  based  on  the  Schwarz  method.  This 
is  because  the  Schwarz  based  ATD  algorithm  requires  inter-domain  communication, 
and  the  cost  of  setting  up  the  local  subproblems  using  the  local  Chebychev  nodes  is 
more  expensive  than  using  the  Chebychev  discretisation  on  the  whole  domain. 

Our  final  table  3  contains  results  of  tests  done  on  a  model  linear  convection- 
diffusion  problem: 

Ut  +  cU*  =  j^U„  +  /, 

with  appropriate  initial  and  boundary  values.  As  for  the  other  tests,  we  present  the 
results  only  for  the  solution  of  the  implicit  equations  that  are  obtained  at  each  time 
step.  We  use  the  algorithm  of  Kutznetsov,  which  does  not  contain  a  coarse  model,  and 
in  which  only  one  iteration  is  used.  Our  tests  using  Chebychev  collocation  on  the  local 
subdomains,  indicated  that  the  algorithm  is  sensitive  to  the  way  in  which  the  forcing 
is  decomposed.  However,  for  a  partitioning  without  steep  gradients,  the  algorithm  is 
expected  (based  on  theory)  to  perform  well.  The  results  listed  in  table  3  are  for  an 
application  of  Kutznetsov’s  algorithm  to  the  resulting  matrix,  where  a  decomposition 
of  the  domain  (or  unknowns  in  the  matrix  formulation)  is  obtained  by  grouping 
neighbouring  Chebychev  nodes  for  a  global  discretisation  of  the  problem.  No  local 
Chebychev  discretisations  were  used  in  these  tests.  Our  partition  of  the  unknowns 
was  based  on  first  forming  n  “subdomains”,  each  consisting  of  just  one  unknown 
(where  n  is  the  number  of  unknowns)  and  then  an  extension  is  formed  by  including 
N0  unknowns  each  from  both  sides.  We  list  the  error  in  the  computed  solution,  for 
various  choices  of  time  steps  At,  Reynolds  numbers  Re,  convection  speeds  c,  overlap 
parameter  Na  and  number  of  nodes  used  in  the  Chebychev  discretisation,  N.  Recall 
that  only  one  iteration  is  used  in  the  algorithm.  The  results  indicate  that  the  error 
decreases  as  the  amount  of  overlap  is  increased,  untill  the  error  becomes  0(Af ).  It  was 
found  that  for  moderate  choices  of  overlap,  the  error  remained  close  to  the  time  step, 
but  not  smaller.  This  indicates,  that  in  the  matrix  version  of  this  algorithm,  there  are 
some  constraints  to  the  accuracy.  And  so,  it  would  be  preferable  to  use  the  algorithm 
using  local  Chebychev  discretisations  and  for  a  smooth  partitioning  of  the  forcing. 
However,  it  must  be  mentioned  that  computational  cost  of  the  matrix  version  of  the 
algorithm  was  small.  For  finite  difference  schemes  the  two  versions  of  the  algorithm, 
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the  matrix  version  and  the  domain  decomposition  version,  are  equivalent  and  perform 
well,  see  Kutznetsov  [19]. 

7  Extensions  to  the  Navier- Stokes  equations. 

The  Schwarz  alternating  method  is  easily  generalised  and  applied  to  the  Stokes  equa¬ 
tions  and  the  incompressible  Navier-Stokes  equations,  see  Lions  [20],  Fortin  and 
Manouzi  [12],  and  Aboulaich  and  Fortin  [1].  In  the  case  of  the  Stokes  problem, 
convergence  proofs  are  known,  see  Lions  [20]. 

The  generalisation  of  the  nonlinear  Gauss-Seidel-Newton  method  to  the  Stokes 
and  Navier-Stokes  equations  has  been  done  for  certain  finite  difference  methods,  see 
Brandt  and  Dinar  [4],  and  Hackbusch  [16].  It  is  referred  to  as  a  Distributive  Gauss- 
Seidel  method.  The  extension  to  other  types  of  discretisations  requires  more  study. 

We  intend  to  study  and  test  Kutznetsov’s  method  for  the  case  of  the  Stokes  and 
Navier-Stokes  equations. 

r  8  Some  future  plans. 

»  Our  studies  have  led  us  to  focus  on  various  issues,  concerning  the  simulation  of  tur¬ 

bulence  in  the  incompressible  Navier-Stokes  equations,  in  order  to  reduce  the  compu¬ 
tational  cost.  Specifically,  we  would  like  to  study  how  to  adaptively  coarsen  or  refine 
the  mesh  in  space  as  the  minimum  scale  of  the  flow  changes  in  time.  This  can  be 
estimated  quantitatively  using  the  bound  for  the  minimum  scale  given  in  Henshaw, 
Kreiss  and  Reyna  [17]  which  depends  on  the  Reynolds  number  of  the  flow  based  on  the 
kinematic  viscosity  and  maximum  velocity  gradients.  In  addition,  local  refinement 
may  be  possible  based  on  local  bounds  for  the  maximum  velocity  gradients. 

For  regimes  of  the  flow  where  large-scale  structures  are  present,  we  would  like  to 
test  the  use  of  different  time  scales  for  the  large-scale  and  small-scale  components  of 
the  solution.  We  intend  to  study  these  in  the  context  of  the  Stokes  and  Navier-Stokes 
equations  in  two  dimensions. 

Finally,  we  would  also  like  to  focus  on  turbulent  flows  which  are  locally  homoge¬ 
neous,  in  which  case  it  may  be  possible  to  use  direct  simulation  on  a  few  subregions 

to  generate  a  model  for  the  small-scale  features  of  the  solution  on  the  whole  domain. 
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